An Investigation of Dirichlet Prior Smoothing’s Performance Advantage

نویسندگان

  • Mark D. Smucker
  • James Allan
چکیده

In the language modeling approach to information retrieval, Dirichlet prior smoothing frequently outperforms Jelinek-Mercer smoothing. Both Dirichlet prior and Jelinek-Mercer are forms of linear interpolated smoothing. The only difference between them is that Dirichlet prior determines the amount of smoothing based on a document’s length. Our hypothesis was that Dirichlet prior’s performance advantage comes from an implicit document prior that favors longer documents. We tested our hypothesis by first calculating a prior for a given document length from the known relevant documents. We then determined the performance of each smoothing method with and without the document prior. We discovered that when given the document prior, Jelinek-Mercer smoothing matches or exceeds the performance of Dirichlet prior smoothing. Dirichlet prior smoothing’s performance advantage appears to come more from an implicit prior favoring longer documents than from better estimation of the document model.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Introducing of Dirichlet process prior in the Nonparametric Bayesian models frame work

Statistical models are utilized to learn about the mechanism that the data are generating from it. Often it is assumed that the random variables y_i,i=1,…,n ,are samples from the probability distribution F which is belong to a parametric distributions class. However, in practice, a parametric model may be inappropriate to describe the data. In this settings, the parametric assumption could be r...

متن کامل

Preparation of Polyvinylchloride Nanofiltration Membrane: Investigation the Effect of Thickness, Prior Evaporation Time and Addition Polyethylenglchol as Additive on Membrane Performance and Properties

In this work, polyvinylchloride (PVC) membrane prepared via casting solution technique and phase inversion method. N, N dimethylacetamide (DMAC) was used as primary solvent and Tetrahydrofuran (THF) was used as a co-solvent. The effects of parameters such as membrane thickness, evaporation time of casting film before immersion precipitation and addition of polyethylenglchol (PEG) on PVC membran...

متن کامل

Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation

We define the crouching Dirichlet, hidden Markov model (CDHMM), an HMM for partof-speech tagging which draws state prior distributions for each local document context. This simple modification of the HMM takes advantage of the dichotomy in natural language between content and function words. In contrast, a standard HMM draws all prior distributions once over all states and it is known to perfor...

متن کامل

Experimental Investigation of Thermal Performance in an Advanced Solar Collector with Spiral Tube

This paper reports the thermal performance of a new cylindrical solar collector based on an experimental investigation with this difference that instead of the collector tube with absorbent coating, coil into a spiral copper tube is placed in the center of the collector. The spiral shape of the tube, heat transfer without disruption or increase the heat transfer area, is increasing. In this cas...

متن کامل

An Investigation into the Effect of Hydrotalcite Calcination Temperature on the Catalytic Performance of Mesoporous Ni-MgO-Al2O3 Catalyst in the Combined Steam and Dry Reforming of Methane

Several mesoporous nickel-based catalysts with MgO-Al2O3 as the catalyst support were prepared using a co-precipitation method at a constant pH. The supports were prepared from the decomposition of an Mg-Al hydrotalcite-like structure which had already been prepared with Mg/Al=1. Prior to impregnating 10 wt.% nickel on the supports, the precursor was decomposed at several ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005